HDI - What a time to be alive

2022

Using the United Nation’s Human Development Index (HDI) data to produce maps and other charts.

Emma Perez Hernandez
2023-01-06

The Economist published this map in September 2022 in order to highlight that ‘living standards are moving in the wrong direction’. It is striking how most countries in the world have worsened their scores in the Human Development Index (HDI) between 2019 and 2021, which makes the map mainly red. The HDI, calculated by the United Nations since 1990, is a summary measure of average achievement in three key dimensions of human development: a long and healthy life, being knowledgeable and have a decent standard of living.

The aim of this tutorial is to reproduce the map and propose an alternative visualization using the same dataset.

Getting and cleaning data

All the data needed for this map can be downloaded directly from the UNDP website. https://hdr.undp.org/data-center/documentation-and-downloads They provide a complete data set with the latest ranking (2021), the index value for each country since 1990 and its components. Since the map only shows the change between 2019 and 2021, I select a subset of the data to compute the growth change in the required period, the country name and the ISO3 code.

HDI <- read.csv("https://hdr.undp.org/sites/default/files/2021-22_HDR/HDR21-22_Composite_indices_complete_time_series.csv")

my_data <- HDI %>% select(country, iso3, hdi_2019, hdi_2021) %>% 
  mutate(change = ((hdi_2021/hdi_2019) -1)*100) %>% 
  mutate(change = round(change,1))

To draw the world map, the giscoR packaged is used to get the geographic information of each country in an sf format, which is required to use geom_sf in ggplot.

world <- giscoR::gisco_get_countries()
class(world)
[1] "sf"         "data.frame"

Now, the geographical information and the HDI data can be joined by the ISO3 code to form the complete dataset. This dataset must also be class sf, so I use st_as_sf to keep this format. Then, the Antarctica is removed from the final data, since this region does not appear in the original map.

#checking that both data sets contain the same countries 
hdi_data <- HDI %>% distinct(iso3) %>% as_vector()
nat_data <- world$ISO3_CODE
differ <-as.data.frame(setdiff(hdi_data, nat_data))

all_data <- st_as_sf(left_join(world, my_data, by = c("ISO3_CODE" = "iso3")))

all_data <- all_data %>% filter(!ISO3_CODE %in% c("ATA"))

summary(all_data)
   CNTR_ID           NAME_ENGL          ISO3_CODE        
 Length:256         Length:256         Length:256        
 Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character  
                                                         
                                                         
                                                         
                                                         
  CNTR_NAME             FID              country         
 Length:256         Length:256         Length:256        
 Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character  
                                                         
                                                         
                                                         
                                                         
    hdi_2019         hdi_2021          change       
 Min.   :0.3930   Min.   :0.3850   Min.   :-5.2000  
 1st Qu.:0.6105   1st Qu.:0.5995   1st Qu.:-1.5000  
 Median :0.7460   Median :0.7390   Median :-0.9000  
 Mean   :0.7275   Mean   :0.7206   Mean   :-0.9838  
 3rd Qu.:0.8435   3rd Qu.:0.8350   3rd Qu.:-0.3000  
 Max.   :0.9620   Max.   :0.9620   Max.   : 2.6000  
 NA's   :65       NA's   :65       NA's   :65       
          geometry  
 MULTIPOLYGON :134  
 POLYGON      :122  
 epsg:4326    :  0  
 +proj=long...:  0  
                    
                    
                    

Basic elements

Map and main legend

To start with, the map is plotted by using geom_sf, which automatically interprets the geographical data. The fill aesthetic is set to the variable “change”, and the proper scale fill is added to fix the same breaks and colors than The Economist. The guide for it is guide_colorsteps. Color is fixed in white and the size in 0.1 to generate the lines for countries’ contour. Theme_map is added, since it sets all chart elements in blank but for the map.

m <- ggplot(data = all_data) +  
  geom_sf(aes(fill = change), color = "white", size = 0.1) +
  scale_fill_stepsn(breaks = c(-4,-2,0,2),
                    colors = c("#be0f08", "#f6423c", "#ffa9a6", "#a4bfd6", "#1e5c98"), 
                    na.value = "#e0ded4",
                    guide = guide_colorsteps(even.steps = TRUE, 
                                             order = 1,
                                             ticks = TRUE, 
                                             ticks.colour= "black",
                                             ticks.linewidth = 1,
                                             direction = "horizontal",
                                             title = NULL,
                                             draw.llim = TRUE,
                                             draw.ulim = TRUE,
                                             frame.color = "black",
                                             label.theme = element_text(size= 15))) +
  theme_map()

m

Second legend

Since the previous legend does not show the missing values, another legend has to be configured for them. To do so, I get a very small region (Gibraltar), and add it as another map layer with color aesthetic in blank. For this aes, a scale color is defined in which a guide_bins is added specifying the proper color in the override.aes argument.

Gibraltar <- all_data %>% filter(ISO3_CODE == "GIB")

m <- m + geom_sf(data = Gibraltar, aes(color = ""))+
  scale_color_discrete(guide=guide_bins(
                             axis = FALSE,
                             label = FALSE,
                             title = "No data",
                             title.position = "right",
                             title.vjust = 1,
                             title.hjust = 0,
                             title.theme = element_text(size = 16, 
                                                        color = "grey50"),
                             override.aes = list(color =  "#e0ded4", fill = "#e0ded4"),
                             order = 2, 
                             direction = "horizontal")) + 
    theme(legend.position = c(0.63,0.95),
        legend.key = element_rect(color = "white"),
        legend.key.height = unit(2.2,"mm"),
        legend.key.width = unit(4.7,"mm"),
        legend.margin = margin(0,0,0,0),
        legend.direction = "horizontal",
        legend.box = "horizontal", 
        legend.spacing = unit(4.7,"mm"),
        legend.spacing.y = unit(1,"mm"),
        legend.background = element_blank())
m

To add the black points place in the center of each of the countries annotated, we can filter them from the dataset and get their geographical centers with st_point_on_surface

Another dataframe is created for Australia

points <- all_data %>% filter(ISO3_CODE %in% c("LBN", "BGD", "VEN", "NAM"))
countries_center <- st_point_on_surface(points)

australia <- all_data %>% filter(ISO3_CODE == "AUS")

Then, we can add them in two different layers

m <-  m + geom_sf_text(data = australia, aes(label = paste(NAME_ENGL, "1.1")), 
               size = 4.7, 
               color = "#1e5c98")+
  geom_sf(data = countries_center, size = 0.5) 

m

Projection and annotations

Projection

To this point, all the geographic information available in our dataset has been used, but the map does not look the same than the original because by default the projection used by geom_sf is …… Since the one used by The Economist is the Robinson projection, the coordinate system of the map has to be changed in accordance.

m <- m +
  coord_sf(crs = st_crs("+proj=robin"))

m

Spatial coordinates

After changing the projection, all the following annotations must be in the same Coordinate Reference System, so a new sf object is created with the standard coordinates (latitude and longitude) of each country to be annotated and other for the end of the segment linked to it. Then, they are transformed to the Robinson projection.

lebanon <- st_sfc(st_point(c(35.49,33.9)), crs=4326)
lebanon <- st_transform(lebanon, "+proj=robin")
lebanon_end <- st_sfc(st_point(c(-14,33.9)), crs=4326)
lebanon_end <-  st_transform(lebanon_end, "+proj=robin")

namibia <-  st_sfc(st_point(c(17.7,-23)), crs=4326)
namibia <- st_transform(namibia, "+proj=robin")
namibia_end <- st_sfc(st_point(c(8,-23)), crs=4326)
namibia_end <- st_transform(namibia_end, "+proj=robin")

venezuela <-  st_sfc(st_point(c(-65,6.5)), crs=4326)
venezuela <- st_transform(venezuela, "+proj=robin")
venezuela_end <- st_sfc(st_point(c(-85,6.5)), crs=4326)
venezuela_end <- st_transform(venezuela_end, "+proj=robin")

bangladesh <-  st_sfc(st_point(c(90,23.7)), crs=4326)
bangladesh <- st_transform(bangladesh, "+proj=robin")
bangladesh_end <- st_sfc(st_point(c(130,23.7)), crs=4326)
bangladesh_end <- st_transform(bangladesh_end, "+proj=robin")

Annotations

With the elements above, segments and text annotations can be put in place using annotate. The name and the change value of each country are used as labels, and their corresponding colors are also added.

m <- m + annotate("segment", x=lebanon[[1]][1], y=lebanon[[1]][2],
           xend=lebanon_end[[1]][1], yend=lebanon_end[[1]][2], size = 0.5) +
  annotate("text", x=lebanon_end[[1]][1], y=lebanon_end[[1]][2], 
           label = "Lebanon -5.2",
           hjust = 1.05,
           color = "#be0f08",
           size = 4.7) +
  annotate("segment", x=namibia[[1]][1], y=namibia[[1]][2],
           xend=namibia_end[[1]][1], yend=namibia_end[[1]][2], size = 0.5) +
  annotate("text", x=namibia_end[[1]][1], y=namibia_end[[1]][2], 
           label = "Namibia -3.8",
           hjust = 1.05,
           color = "#f6423c",
           size = 4.7)+
  annotate("segment", x=venezuela[[1]][1], y=venezuela[[1]][2],
           xend=venezuela_end[[1]][1], yend=venezuela_end[[1]][2], size = 0.5) +
  annotate("text", x=venezuela_end[[1]][1], y=venezuela_end[[1]][2], 
           label = "Venezuela -4.2",
           hjust = 1.05,
           color = "#be0f08",
           size = 4.7)+
  annotate("segment", x=bangladesh[[1]][1], y=bangladesh[[1]][2],
           xend=bangladesh_end[[1]][1], yend=bangladesh_end[[1]][2], size = 0.5)+
  annotate("text", x=bangladesh_end[[1]][1], y=bangladesh_end[[1]][2], 
           label = "Bangladesh 2.6",
           hjust = -0.05,
           color = "#1e5c98",
           size = 4.7)

m

Final result

Lastly, adding the proper title, subtitle and caption, and placing them in the same positions, the final map is achieved. The font used is “Open Sans”, which can be downloaded from Google Fonts.

m + labs(title = "What a time to be alive",
       subtitle = "Change in UN Human Development Index, 2019-21, %",
       caption = "Source: United Nations")+
  theme(plot.title = element_text(face="bold", 
                                  family = "Open Sans",
                                  size = 16, 
                                  hjust = 0.14, 
                                  margin = margin(10,0,5,0)),
        plot.title.position = "panel",
        plot.subtitle = element_text(size = 16, 
                                     family = "Open Sans",
                                     hjust = 0.195,
                                     margin = margin(0,0,1,0)),
        plot.caption = element_text(size = 13.5, 
                                    family = "Open Sans",
                                    hjust = 0.14, 
                                    color = "grey50",
                                    margin = margin(0,0,0,0)),
        plot.caption.position = "panel")

ALTERNATIVE VIZ

The Economist’s map shows just a narrow time span of the evolution of living standards, and it does not allow to see which areas are better off worldwide either. Since the data provided by UN does contain all this information, I want to produce a chart that convey regional disparities and temporal evolution.

Additionally, as the HDI index is also produce separately for men and women, it is possible to compute the difference between them for each country and year (if available). In this way, a measure of the gender gap is generated as HDI male - HDI female, whose evolution in time can also be tracked. In this way, a gender perspective is added to the alternative visualization.

Preparing data

The UN’s dataset is in a wide format, with years added to the name of each variable, having more than 1000 columns. In order to produce a tidy dataset with only the required information, I compute it by joining 3 different sub dataframes (main index, male and female) by country and year.

HDI<- HDI %>% rename(ranking_2021 = hdi_rank_2021)

p <- HDI %>% 
  select(matches("^(hdi)(_)([0-9]+)$"), iso3, country, ranking_2021, region, hdicode) %>% 
  pivot_longer(contains("hdi_"), names_sep = "_", 
               names_to = c("hdi", "year"), values_to = "value")


f <- HDI %>% 
  select(contains("hdi_f"), country) %>% 
  pivot_longer(contains("hdi_"), names_to = c("year"), values_to = "hdi_f") %>% 
  mutate(year = str_remove(year, "hdi_f_"))

m <- HDI %>% 
  select(contains("hdi_m"), country) %>% 
  pivot_longer(contains("hdi_"), names_to = c("year"), values_to = "hdi_m") %>% 
  mutate(year = str_remove(year, "hdi_m_"))

hditotal <- left_join(p,m, by = c("year"="year", "country" = "country"))
hditotal <- left_join(hditotal,f, by = c("year"="year", "country" = "country"))

Once the data is ready, I generate the gender gap variable and convert year to numeric. In addition, I remove the observations related to aggregated areas () and create a Region variable based on the original one used to classify developing regions, where missing values correspond to developed countries.

hditotal <- hditotal %>% mutate(f_m_diff = hdi_m - hdi_f,
                                 year = as.numeric(year)) %>% 
  filter(!startsWith(iso3, "ZZ")) %>% 
  mutate(Region = case_when(
    region == "AS" ~ "Arab States", 
    region == "EAP" ~  "East Asia & Pacific",
    region == "ECA" ~ "Europe & Central Asia",
    region == "LAC" ~ "Latin America & Caribbean",
    region == "SA" ~ "South Asia",
    region == "SSA" ~  "Sub-Saharan Africa",
    TRUE ~ "Developed"))
hdi_levels<- c(0.5,0.7,0.8, 1)
hdi_text_levels <- c("Low", "Medium", "High", "Very High")
hdi_pos_levels <- c(0.48,0.68,0.78, 0.98)
a <- hditotal %>%  filter(year == 1990) %>% 
ggplot() +
  scale_y_continuous(limits= c(0.2,1), breaks = hdi_levels)+
  scale_x_continuous(limits = c(-0.08,0.28), expand = c(0,0))+
  geom_text(aes(0.24, 0.9, label=year), size= 11, color = "gray")+
  geom_hline(yintercept=c(0.5,0.7,0.8, 1), linetype="dashed", color = "grey")+
  geom_vline(xintercept=c(0), linetype="dashed", color = "grey50")+
  annotate("text", y=hdi_pos_levels, x=-0.08, 
           color="darkgray", vjust=1.3, hjust=0, size=5, 
           label=hdi_text_levels) +
  xlab("HDI Gender gap")+
  ylab("Human Development Index (HDI)")+
  theme(text = element_text(family = "Open Sans"),
        axis.line.y = element_blank(),
        axis.line.x = element_line(color = "gray50"),
        axis.ticks = element_blank(),
        axis.text = element_text(size = 11),
        axis.title = element_text(size = 14),
        axis.text.y.left = element_text(margin = margin(10,10,10,10)),
        panel.grid.major.x =element_line(linetype=1, color = "grey", size = 0.05),
        panel.grid.minor = element_blank(),
        panel.background = element_blank())+
  geom_point(aes(y = value, x = f_m_diff, color = Region, size = hdi_f), alpha = 0.7) + 
  scale_color_brewer(palette = "Set1", 
                     guide = guide_legend(title = "Region",
                                          override.aes = list(size=5),
                                          direction = "vertical",
                                          title.vjust = 1,
                                          title.hjust = 0,
                                          title.position = "top",
                                          order = 2,
                                          label.theme = element_text(size=12)))+
  scale_size_area(breaks = c(0.2,0.5,0.7,0.8, 1), 
                  max_size = 5,
                    guide = guide_legend(title = "HDI female",
                                         title.vjust = 1,
                                         title.hjust = 0.5,
                                         title.position = "top",
                                         direction = "vertical",
                                         label.position = "right",
                                         order=1))+
  #geom_point(aes(size=hdi_f, x = f_m_diff, y = value), color="#333333", shape=21)+ 
  labs(title = "Human Development Index and Gender gaps",
       subtitle = "Gender gap =  HDI male - HDI female") +
  theme(plot.title = element_text(face = "bold", size = 15),
        plot.subtitle = element_text(face = "italic", size = 14),
        plot.title.position = "plot",
        legend.key = element_blank(),
        legend.direction = "v")
a

countries <- c("Spain", "Afghanistan", "China", "Yemen", "India", "Niger", "Uruguay")

labels1990 <- hditotal %>% filter(country %in% countries, year == 1990)
  
a + geom_text(data = labels1990,aes(y = value, x = f_m_diff,label = country), 
              check_overlap = TRUE, 
              size = 4.5)

Adding motion

all_labels <- hditotal %>% filter(country %in% countries)

b <- a +geom_text(data = all_labels,aes(y = value, x = f_m_diff,label = country), 
                  check_overlap = TRUE, 
                  size = 4.5)

plotly::ggplotly(b %+% hditotal+ aes(frame=year))

And finally… a GIF

c <- ggplot(hditotal) +
  scale_y_continuous(limits= c(0.2,1), breaks = hdi_levels)+
  scale_x_continuous(limits = c(-0.08,0.28), expand = c(0,0))+
  geom_text(aes(0.24, 0.9, label= as.factor(year)), size= 6, color = "gray")+
  geom_hline(yintercept=c(0.5,0.7,0.8, 1), linetype="dashed", color = "grey")+
  geom_vline(xintercept=c(0), linetype="dashed", color = "grey50")+
  annotate("text", y=hdi_pos_levels, x=-0.08, 
           color="darkgray", vjust=1.3, hjust=0, size=3.5, 
           label=hdi_text_levels) +
  xlab("HDI Gender gap")+
  ylab("Human Development Index (HDI)")+
  theme(text = element_text(family = "Open Sans"),
        axis.line.y = element_blank(),
        axis.line.x = element_line(color = "gray50"),
        axis.ticks = element_blank(),
        axis.text = element_text(size = 9),
        axis.title = element_text(size = 10),
        axis.text.y.left = element_text(margin = margin(10,10,10,10)),
        panel.grid.major.x =element_line(linetype=1, color = "grey", size = 0.05),
        panel.grid.minor = element_blank(),
        panel.background = element_blank())+
  geom_point(aes(y = value, x = f_m_diff, color = Region, size = hdi_f), alpha = 0.6) + 
  scale_color_brewer(palette = "Set1", 
                     guide = guide_legend(title = "Region",
                                          override.aes = list(size=6),
                                          direction = "vertical",
                                          title.vjust = 1,
                                          title.hjust = 0,
                                          title.position = "top",
                                          order = 2,
                                          label.theme = element_text(size=10)))+
  scale_size_area(breaks = c(0.2,0.5,0.7,0.8, 1), 
                  max_size = 6,
                    guide = guide_legend(title = "HDI female",
                                         title.vjust = 1,
                                         title.hjust = 0.5,
                                         title.position = "top",
                                         direction = "vertical",
                                         label.position = "right",
                                         order=1))+
  labs(title = "Human Development Index and Gender gaps",
       subtitle = "Gender gap =  HDI male - HDI female") +
  theme(plot.title = element_text(face = "bold", size = 13),
        plot.subtitle = element_text(face = "italic", size = 11),
        plot.title.position = "plot",
        legend.key = element_blank(),
        legend.direction = "v") +
      transition_time(year) +
    ease_aes('linear')
animate(c, height = 400, width = 600)